Acoustic Scene Classification

نویسندگان

  • Daniele Barchiesi
  • Dimitrios Giannoulis
  • Dan Stowell
  • Mark D. Plumbley
چکیده

In this article we present an account of the state-of-the-art in acoustic scene classification (ASC), the task of classifying environments from the sounds they produce. Starting from a historical review of previous research in this area, we define a general framework for ASC and present different implementations of its components. We then describe a range of different algorithms submitted for a data challenge that was held to provide a general and fair benchmark for ASC techniques. The dataset recorded for this purpose is presented, along with the performance metrics that are used to evaluate the algorithms and statistical significance tests to compare the submitted methods. We use a baseline method that employes MFCCS, GMMS and a maximum likelihood criterion as a benchmark, and only find sufficient evidence to conclude that three algorithms significantly outperform it. We also evaluate the human classification accuracy in performing a similar classification task. The best performing algorithm achieves a mean accuracy that matches the median accuracy obtained by humans, and common pairs of classes are misclassified by both computers and humans. However, all acoustic scenes are correctly classified by at least some individuals, while there are scenes that are misclassified by all algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Acoustic Scene Classification Using Spatial Features

Due to various factors, the vast majority of the research in the field of Acoustic Scene Classification has used monaural or binaural datasets. This paper introduces EigenScape a new dataset of 4th-order Ambisonic acoustic scene recordings and presents preliminary analysis of this dataset. The data is classified using a standard Mel-Frequency Cepstral Coefficient Gaussian Mixture Model system, ...

متن کامل

Capturing the Acoustic Scene Characteristics for Audio Scene Detection

Scene detection on user-generated content (UGC) aims to classify an audio recording that belongs to a specific scene such as busy street, office or supermarket rather than a sound such as car noise, computer keyboard or cash machine. The difficulty of scene content analysis on UGC lies in the lack of structure and acoustic variability of the audio. The i-vector system is state-of-the-art in Spe...

متن کامل

IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events AN I-VECTOR BASED APPROACH FOR AUDIO SCENE DETECTION

The IEEE-ASSP Scene Classification challenge on user-generated content (UGC) aims to classify an audio recording that belongs to a specific scene such as busystreet, office or supermarket. The difficulty of scene content analysis on UGC lies in the lack of structure and acoustic variability of the data. The i-vector system is state-ofthe-art in Speaker Verification and Scene Detection, and is o...

متن کامل

Hierarchical learning for DNN-based acoustic scene classification

In this paper, we present a deep neural network (DNN)-based acoustic scene classification framework. Two hierarchical learning methods are proposed to improve the DNN baseline performance by incorporating the hierarchical taxonomy information of environmental sounds. Firstly, the parameters of the DNN are initialized by the proposed hierarchical pre-training. Multi-level objective function is t...

متن کامل

Gated Recurrent Networks Applied to Acoustic Scene Classification and Acoustic Event Detection

We present two resource efficient frameworks for acoustic scene classification and acoustic event detection. In particular, we combine gated recurrent neural networks (GRNNs) and linear discriminant analysis (LDA) for efficiently classifying environmental sound scenes of the IEEE Detection and Classification of Acoustic Scenes and Events challenge (DCASE2016). Our system reaches an overall accu...

متن کامل

Deep Neural Network Bottleneck Feature for Acoustic Scene Classification

Bottleneck features have been shown to be effective in improving the accuracy of speaker recognition, language identification and automatic speech recognition. However, few works have focused on bottleneck features for acoustic scene classification. This report proposes a novel acoustic scene feature extraction using bottleneck features derived from a Deep Neural Network (DNN). On the official ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1411.3715  شماره 

صفحات  -

تاریخ انتشار 2014